Picture for Yang Shi

Yang Shi

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

Riemannian Liquid Spatio-Temporal Graph Network

Add code
Jan 20, 2026
Viaarxiv icon

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Add code
Jan 15, 2026
Viaarxiv icon

MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models

Add code
Jan 06, 2026
Viaarxiv icon

Detecting Unobserved Confounders: A Kernelized Regression Approach

Add code
Jan 01, 2026
Viaarxiv icon

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Add code
Dec 17, 2025
Viaarxiv icon

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Add code
Dec 14, 2025
Figure 1 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 2 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 3 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 4 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Viaarxiv icon

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss

Add code
Dec 09, 2025
Figure 1 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 2 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 3 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 4 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Viaarxiv icon